intrinsic motivation
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > France (0.04)
- Oceania > Australia (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- North America > Dominican Republic (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New Hampshire > Hillsborough County > Nashua (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > Canada > Ontario > Toronto (0.04)
LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreover, the interference from task-irrelevant observations in the episodic count may cause its intrinsic motivation to overlook task-related important changes of states, and the novelty in an episodic manner can lead to repeatedly revisit the familiar states across episodes. In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems. In particular, the proposed intrinsic reward consists of the episodic novelty and the task-specific modulation where the former employs a vector quantized variational autoencoder to automatically obtain the discrete state codes for fast counting while the latter regulates the episodic novelty by learning a modulator to optimize the task-specific extrinsic reward. The proposed LECO specifically enables the automatic transition from exploration to exploitation during reinforcement learning. We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in MiniGrid and DMLab environments.
Information is Power: Intrinsic Control via Information Capture
Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.
- Oceania > Australia (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)